A multiclass/multilabel document categorization system: Combining multiple classifiers in a reduced dimension
نویسندگان
چکیده
This article presents a multiclassifier approach for multiclass/multilabel document categorization problems. For the categorization process, we use a reduced vector representation obtained by SVD for training and testing documents, and a set of k-NN classifiers to predict the category of test documents; each k-NN classifier uses a reduced database subsampled from the original training database. To perform multilabeling classifications, a new approach based on Bayesian weighted voting is also presented. The good vailable online xxx
منابع مشابه
Text mining and topic models
Classifiers for documents are useful for many applications. Major uses for binary classifiers include spam detection and personalization of streams of news articles. Multiclass classifiers are useful for routing messages to recipients. Most classifiers for documents are designed to categorize according to subject matter. However, it is also possible to learn to categorize according to qualitati...
متن کاملA Multiclassifier based Document Categorization System: profiting from the Singular Value Decomposition Dimensionality Reduction Technique
In this paper we present a multiclassifier approach for multilabel document classification problems, where a set of k-NN classifiers is used to predict the category of text documents based on different training subsampling databases. These databases are obtained from the original training database by random subsampling. In order to combine the predictions generated by the multiclassifier, Bayes...
متن کاملEfficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain
In this paper we applied multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union. On this document collection, we studied three different multilabel classification problems, the largest being the categorization into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach which independ...
متن کاملCranking: Combining Rankings Using Conditional Probability Models on Permutations
A new approach to ensemble learning is introduced that takes ranking rather than classification as fundamental, leading to models on the symmetric group and its cosets. The approach uses a generalization of the Mallows model on permutations to combine multiple input rankings. Applications include the task of combining the output of multiple search engines and multiclass or multilabel classifica...
متن کاملSparse Max-Margin Multiclass and Multi-label Classifier Design for Fast Inference
We address the problems of sparse multiclass and multilabel classifier design and devise new algorithms using margin based ideas. Many online applications such as image classification or text categorization demand fast inference. State-of-the-art classifiers such as Support Vector Machines (SVM) are not preferred in such applications because of slow inference, which is mainly due to the large n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Appl. Soft Comput.
دوره 11 شماره
صفحات -
تاریخ انتشار 2011